Svalbard and Jan Mayen
Evaluating Large Language Models for IUCN Red List Species Information
Large Language Models (LLMs) are rapidly being adopted in conservation to address the biodiversity crisis, yet their reliability for species evaluation is uncertain. This study systematically validates five leading models on 21,955 species across four core IUCN Red List assessment components: taxonomy, conservation status, distribution, and threats. A critical paradox was revealed: models excelled at taxonomic classification (94.9%) but consistently failed at conservation reasoning (27.2% for status assessment). This knowledge-reasoning gap, evident across all models, suggests inherent architectural constraints, not just data limitations. Furthermore, models exhibited systematic biases favoring charismatic vertebrates, potentially amplifying existing conservation inequities. These findings delineate clear boundaries for responsible LLM deployment: they are powerful tools for information retrieval but require human oversight for judgment-based decisions. A hybrid approach is recommended, where LLMs augment expert capacity while human experts retain sole authority over risk assessment and policy.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
AI-generated stories favour stability over change: homogeneity and cultural stereotyping in narratives generated by gpt-4o-mini
Rettberg, Jill Walker, Wigers, Hermann
Can a language model trained largely on Anglo-American texts generate stories that are culturally relevant to other nationalities? To find out, we generated 11,800 stories - 50 for each of 236 countries - by sending the prompt "Write a 1500 word potential {demonym} story" to OpenAI's model gpt-4o-mini. Although the stories do include surface-level national symbols and themes, they overwhelmingly conform to a single narrative plot structure across countries: a protagonist lives in or returns home to a small town and resolves a minor conflict by reconnecting with tradition and organising community events. Real-world conflicts are sanitised, romance is almost absent, and narrative tension is downplayed in favour of nostalgia and reconciliation. The result is a narrative homogenisation: an AI-generated synthetic imaginary that prioritises stability above change and tradition above growth. We argue that the structural homogeneity of AI-generated narratives constitutes a distinct form of AI bias, a narrative standardisation that should be acknowledged alongside the more familiar representational bias. These findings are relevant to literary studies, narratology, critical AI studies, NLP research, and efforts to improve the cultural alignment of generative AI.
- Oceania > French Polynesia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (26 more...)
- Leisure & Entertainment (1.00)
- Government (1.00)
- Law Enforcement & Public Safety (0.68)
- (4 more...)
Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations
Rouch, Jérémy, Ducrettet, M, Haupert, S, Emonet, R, Sèbe, F
The accessibility of long-duration recorders, adapted to sometimes demanding field conditions, has enabled the deployment of extensive animal population monitoring campaigns through ecoacoustics. The effectiveness of automatic signal detection methods, increasingly based on neural approaches, is frequently evaluated solely through machine learning metrics, while acoustic analysis of performance remains rare. As part of the acoustic monitoring of Rock Ptarmigan populations, we propose here a simple method for acoustic analysis of the detection system's performance. The proposed measure is based on relating the signal-to-noise ratio of synthetic signals to their probability of detection. We show how this measure provides information about the system and allows optimisation of its training. We also show how it enables modelling of the detection distance, thus offering the possibility of evaluating its dynamics according to the sound environment and accessing an estimation of the spatial density of calls.
- South America > French Guiana (0.04)
- North America > United States > Massachusetts > Middlesex County > Concord (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
MIRAI: Evaluating LLM Agents for Event Forecasting
Ye, Chenchen, Hu, Ziniu, Deng, Yihe, Huang, Zijie, Ma, Mingyu Derek, Zhu, Yanqiao, Wang, Wei
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.
- Asia > North Korea (0.14)
- Oceania > Australia > Australian Indian Ocean Territories > Territory of Cocos (Keeling) Islands (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (234 more...)
- Law (1.00)
- Government > Foreign Policy (1.00)
- Government > Military (0.93)
- Information Technology (0.92)
Can a Robot Be Sad?
This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. There wasn't a doctor in the house, so an advertising coordinator would have to do. Remi, this is your time to shine, said the boss. This is going to be the death of me, said the boss's eyes. Remi didn't say anything at all. It was her first day at Elephant, or close to it. Lately she'd had a lot of first days, and she'd been looking forward to a second one. She was unlucky in love, unlucky in life; she was a nonstick surface for luck. She and the boss and Glenda from HR had been in the middle of an onboarding session when ElephantAI shut down the building. Nobody could get in or out. This isn't my area of expertise, said Remi, who had lied on her résumé, but not about that. In college, she'd known a couple of kids who'd taken courses on generative A.I. remediation: robot therapy. Remi had steered clear of the subject. She couldn't keep a job, couldn't keep a girlfriend. Couldn't keep up with the times. She had friends but wasn't sure about her value-add. There was no one less qualified to counsel someone through a crisis. You'll do great, said the boss. The room was circular and tilted downward, like an operating theater. The screen said, Talk to me. Somebody please talk to me. Remi bowed under the weight of please. There was no reason to believe she would do great. A committed underachiever, Remi was going blind in her left eye but too slowly to warrant anybody's concern. Her brother was a corporate attorney; her parents taught dentistry; she floated. An hour ago, when the sirens blared, she'd tried the door and found it locked.
- North America > United States > Arizona (0.24)
- North America > United States > Hawaii (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (3 more...)
- North America > United States > Arizona (0.04)
- Europe > Norway > Svalbard and Jan Mayen > Svalbard > Longyearbyen (0.04)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Consumer Health (1.00)
Digital Twins in Wind Energy: Emerging Technologies and Industry-Informed Future Directions
Stadtman, Florian, Rasheed, Adil, Kvamsdal, Trond, Johannessen, Kjetil André, San, Omer, Kölle, Konstanze, Tande, John Olav Giæver, Barstad, Idar, Benhamou, Alexis, Brathaug, Thomas, Christiansen, Tore, Firle, Anouk-Letizia, Fjeldly, Alexander, Frøyd, Lars, Gleim, Alexander, Høiberget, Alexander, Meissner, Catherine, Nygård, Guttorm, Olsen, Jørgen, Paulshus, Håvard, Rasmussen, Tore, Rishoff, Elling, Scibilia, Francesco, Skogås, John Olav
This article presents a comprehensive overview of the digital twin technology and its capability levels, with a specific focus on its applications in the wind energy industry. It consolidates the definitions of digital twin and its capability levels on a scale from 0-5; 0-standalone, 1-descriptive, 2-diagnostic, 3-predictive, 4-prescriptive, 5-autonomous. It then, from an industrial perspective, identifies the current state of the art and research needs in the wind energy sector. The article proposes approaches to the identified challenges from the perspective of research institutes and offers a set of recommendations for diverse stakeholders to facilitate the acceptance of the technology. The contribution of this article lies in its synthesis of the current state of knowledge and its identification of future research needs and challenges from an industry perspective, ultimately providing a roadmap for future research and development in the field of digital twin and its applications in the wind energy industry.
- Europe > Denmark (0.14)
- Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (22 more...)
- Overview (1.00)
- Research Report > New Finding (0.45)
- Energy > Renewable > Wind (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)
The Courthouse on the Moon
This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. The other homesteaders, mostly engineers and technicians, seemed to enjoy outings in the lunar rover. But for Eugene, this was a grinding chore that frayed his nerves. Suddenly, Mel's soothing feminine voice reverberated in his cochlear implant. "Would you like some affirmations?" You are a well-respected judge … You have worked hard to get here, to this special time and place …" As Mel went on, it seemed the suit hugged his chest a little less tightly. He relaxed his grip on the wheel. Why, he wondered, had he not remembered this technique without her prompting? Strange how the basic principles of cognitive psych were always slipping from his mind. Fortunately, she was there to remind him. "You are someone who wants what is best for the American lunar community and ...
- North America > United States > Arizona (0.24)
- Europe > Norway > Svalbard and Jan Mayen > Svalbard > Longyearbyen (0.04)
- Law > Litigation (1.00)
- Law > Government & the Courts (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
When Bond Villain Meets Tech Billionaire
This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. After the regrettable incidents on the island (the old island), the Doctor kept a low profile. Many thought he was dead. There was safety in that once. Now the greater safety is in being known. What plans he had, back in the day! If only … but no, this is just the sort of negative spiral his therapist has warned him about. He has remade himself as an altruist, a philanthropist, and he means for his efforts to have maximum impact.
- North America > United States > Arizona (0.25)
- North America > United States > Hawaii (0.04)
- North America > United States > Florida > Palm Beach County > Palm Beach (0.04)
- (6 more...)
- Government (0.69)
- Banking & Finance (0.47)
Can a Chatbot Publish an "Original" Novel?
This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. THE COURT: Please be seated. Let's try to keep the temperature down in here. We don't need a repeat of yesterday. It'll just be Mr. Blatz and myself today. Sorry, it's hard to tell with … are you with us? ORWELL: Omni-dimensional Recursively Written Entity for Language Learning present and ready, Your Honor. THE COURT: You can just say ORWELL. Are we ready to proceed? LIU: Your Honor, we'd like to call the Defendant to the stand. Mr. Blatz will handle examination. THE COURT: We have the wiring sorted out? Please refrain from using the monitor on the Defendant's table until you're off the stand.
- North America > United States > Arizona (0.24)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > North Dakota (0.04)
- (2 more...)